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[57] ABSTRACT 

The invention relates to an automated directory assistance 
system that utilizes a priori advisor for predicting the most 
likely requested locality. The automated directory assistance 
system includes a speech recognition dictionary containing 
a plurality of orthographies, each orthography correspond- 
ing to a locality name in which a subscriber whose telephone 
number is sought by the user of the automated directory 
assistance system may be residing. Upon reception of the 
spoken utterance, the system performs a first pass search 
scores on the basis of acoustics characteristics of the orthog- 
raphies in the speech recognition dictionary, each orthogra- 
phy having a certain likelihood of being a match to the 
spoken utterances. The orthographies are then weighed on 
the basis of information indicative of the geographical 
location of the user. A final re-scoring operation may then be 
performed on the top N candidates in the weighed list. This 
system enables to improve recognition accuracy by com- 
bining the acoustical match search with a probabilistic bias 
derived from statistical information on calling patterns in the 
population. 

31 Claims, 4 Drawing Sheets 
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AUTOMATED DIRECTORY ASSISTANCE Another object of this invention is to provide a computer 

SYSTEM UTILIZING PRIORI ADVISOR FOR readable storage medium containing a program element that 

PREDICTING THE MOST LIKELY directs a compuler to perform speech recognition, the pro- 

REQUESTED LOCALITY S ram element being designed such as to improve the speech 

5 recognition. 

FIELD OF THE INVENTION As embodied and broadly described herein the invention 

provides an automated directory assistance system compris- 

This invention relates to a method and an apparatus for i n g : 

automatically performing desired actions in response to a ) a spee ch recognition dictionary including a plurality of 

spoken requests. It is particularly applicable to a method and w orthographies potentially recognizable on a basis of a 

an apparatus for automatically providing desired informa- spoken utterance by a user of said automated directory 

tion in response to spoken requests, as may be used to assistance system, each orthography being indicative of 

partially or fully automated telephone directory assistance a locality in whic h an entity whose telephone number 

functions. potentially sought by the user may reside; 

BACKGROUND OF THE INVENTION 15 b) ™ ms for ex ' rac L tin g bom said speech recognition 

dictionary on the basis of the spoken utterance by the 

In addition to providing printed telephone directories user a list including a plurality of orthographies, each 

telephone companies provide telephone directory assistance of said plurality of orthographies being a candidate 

services. Users of these services call predetermined tele- having a certain probability to correspond to the spoken 

phone numbers and are connected to directory assistance 20 utterance; 

operators. The operators access directory databases to locate c) means for weighing candidates in said list on a basis of 

the directory listings requested by the users, and release the information indicative of a geographical location of the 

telephone numbers of those listings to the users. user of said automated directory assistance system. 

Because telephone companies handle a very large number For the purpose of this specification the expressions 

of directory assistance calls per year, the associated labor 25 "orthography" is a data element that can be mapped onto a 

costs are very significant. Consequently, telephone compa- spoken utterance that can form a single word or a combi- 

nies and telephone equipment manufacturers have devoted nation of words. 

considerable effort to the development of systems that For the purpose of this specification the expression "die- 
reduce the labor costs associated with providing directory tionary" designates a data structure containing orthographies 
assistance services. 30 that ca n be mapped onto a spoken utterance on the basis of 
In a typical directory assistance system the caller is first acoustic characteristics and, optionally, a-priori probabilities 
prompted to provide locality information, in other words to or anolher rule, such as a linguistic or grammar model, 
specify in what area resides the business or individual whose In a most Preferred embodiment of this invention, the 
telephone number he seeks. If valid speech is detected, the automated directory assistance system is integrated into a 
speech recognition layer is invoked in an attempt to recog- 35 telephone network that enables users to formulate requests 
nize the unknown utterance. On a first pass search, a fast b ? usin S subscriber terminal equipment such as mobile or 
match algorithm is used to select the top N orthography fixed telephone sets. Once the automated directory assis- 
groups from a speech recognition dictionary. In a second tance svslem receives a request from the user, it will first 
pass the individual orthographies from the selected groups issae a P rom pt over the telephone network requesting the 
are re-scored using a more precise likelihood computation. 40 user t0 s P ccif y the locallt y m which the telephone number he 
The top orthography in each of the top two groups is then Meks 15 located - If valld s P eech 15 Reeled in response to 
processed by a rejection algorithm that evaluates if they are this P^P 1 ' a s P eech recognition layer is invoked that 
sufficiently distinctive from one another so the top choice sc]sCts from a s P eech recognition dictionary an orthography 
candidate can be considered to be a valid recognition. that 1S m ? si likclv to match the spoken utterance. The speech 
f-rm • « . • 1 ., , , . , , 45 recognition process is essentially a three step operation. The 
The signal processing operation described above is based * t ♦ n f A . « fl ♦ u» 
,« & *" , . , lL , „ r™. first step, usually referred to as first pass search , consists 
solely on an acoustic analysis of the spoken utterance. This r , ; . , . . * 9 
sometimes may not enable the system to make a resolution. ° f SCOrmg „ a11 lh ! ° rt °ographies in the speech recognition 
J . . ,, 7. . , ' 3 \ ., . . '^ U1UUUU - dictionary by performing a rough estimation on the basis of 
ndeed, the wide variety of accents that exist in die popu- ^ ^ 

lation and, more parUcularly, the manner with wh.ch mdi- 5Q FoUowi ^ 

viduals formulate requests results in situations in which , . . . ' , v & , 

, . v . . « 11 . . cedure is performed that will change or alter the probability 

correct word recogmtion cannot be made solely on the basis c r , . t ,. 4 tl _ r , . v 

r * t m. j . m~ . j of one or more candidates in the list on the basis of 

of acoustic match. Thus, there is a need in the industry to . c t . ( . 4 , , 4 . , .... . . 

» j . : . A At _ A ... , mformation other than just acoustical match between the 

provide a speech recognition system that utilizes additional „ „ f . n . \< ftrf , , . T vc 

r . 4 /. - . j spoken utterance and the orthographies. In a specific 

elements of information that when combined with the acous- 5J e £ ptob abiUty of each candidate in the output list 

s ech a lo S ton aSrTc ""^ * condi,ione d on *e basis of geographical information 

speec recogm ion accuracy. relating to the location from which the user has formulated 

OBJECTS AND STATEMENT OF THE ^ request. This information can be valuable in correctly 

INVENTION recognizing the locality name since requests for automated 

60 directory assistance are likely to follow predetermined call 

A principal object of the invention is to provide a speech patterns. By utilizing screened tokens (observing actual call 

recognition system, particularly well suited for use in an records) and actual unscreened call records, statistical infer- 

automated directly assistance system with an improved mation can be gathered to model the calling patterns. This 

speech recognition accuracy. statistical information can then be used in conjunction with 

Another object of the invention is to provide an improved 65 acoustical matching between the spoken utterance and 

method for performing speech recognition, particularly well orthographies in the speech recognition dictionary to 

suited in the context of locality recognition. improve the accuracy of the speech recognition operation. 
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The calling number can be used to determine the geo- 
graphical location of the user. In a very specific example, the 
first six digits of the calling number of a ten digit telephone 
number are used (the first three digits represent the are 
code). This information, usually referred to as "NPA-NXX", 
can be correlated to various orthographies from the speech 
recognition dictionary, that in turn are associated with 
respective probability values. In one specific embodiment, 
the speech recognition system is provided with a plurality of 
data structures, herein referred to as histograms, each his- 
togram being associated to a certain calling NPA-NXX 
combination, each data structure containing an index or 
pointer to an orthography in the speech recognition 
dictionary, each index or pointer in the data structure being 
associated with a certain probability value that is established 
on the basis of observed call patterns. The number of data 
structures available depends on the number of NPA-NXX 
combinations available in the network. If the number of 
those combinations is too high then only the combinations 
that occur most often can be used. If an NPA-NXX combi- 
nation is encountered that is not programmed in the system, 
a default behavior can be designed to handle those situa- 
tions. This will become apparent further on in the descrip- 
tion. 

Once the orthographies in the dictionary have been scored 
as a result of the first pass search, the NPA-NXX combina- 
tion associated with the calling number/called number is 
obtained and the corresponding histogram is retrieved. This 
histogram may be in the form of a table that contains two 
columns, each record thus including two separate fields 
namely a locality identifier which may be the name of the 
locality, an index or a pointer (corresponding to an orthog- 
raphy from the speech recognition dictionary) and an asso- 
ciated probability value. For each orthography in the dic- 
tionary a compound probability estimate is computer using 
the probability stored in this histogram. This computation 
constitutes an example of the weighing operation referred to 
earlier in the broad definition of the invention. In general, the 
weighing operation can be defined as a procedure that has 
the effect of impressing a certain bias over one or more 
orthographies in the speech recognition dictionary, the bias 
being dependent upon information indicative of a geographi- 
cal location of the site from which the user has input the 
spoken utterance and the called directory assistance number. 
The result of this bias is to give higher chances to one 
orthography over another one during the process of selecting 
the orthography that will be output as being the best possible 
match to the spoken utterance. 

At this point, one possibility is to order the list of 
orthographies based on decreasing compound probability 
values and select the choice, in other words, the candidate, 
possessing the highest compound probability value as being 
the orthography that presents the best match for the spoken 
utterance. A more refined approach, which would constitute 
the third processing step mentioned earlier, is to select from 
the ordered list of orthographies the top N scoring orthog- 
raphies and perform a detailed acoustic match analysis for 
every orthography in the sub-group in order to perform a 
final ranking. In this approach, the weighing operation based 
on probabilities in the histogram influences the selection of 
candidates to be submitted to the re-scoring stage. The 
re-scoring operation uses more precise computations and 
selects the most likely candidate. During this computation 
the weighing operation based on probability scores in the 
histograms has no effect on which candidate will be output 
as top choice since the selection performed at the re-scoring 
stage uses acoustic match criteria only. Optionally, it may be 
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desirable to include the a priori probability scores in the 
re-scoring stage in order to weight the orthographies that 
occur frequently in a preferential fashion. 

As embodied and broadly described herein the invention 
also provides a method for performing speech recognition in 
an automated directory assistance system, said method com- 
prising the steps of: 

a) providing a speech recognition dictionary including a 
plurality of orthographies potentially recognizable on a 
basis of a spoken utterance by a user of said automated 
directory assistance system, each orthography being 
indicative of a locality in which an entity whose 
telephone number potentially sought by the user may 
reside; 

b) receiving a spoken utterance by the user of the auto- 
mated directory assistance system; 

d) searching said dictionary to derive a list of 
orthographies, each orthography in said list being a 
candidate having a certain probability to be a match to 
the spoken utterance; 

c) assigning to at least one of the candidates in said list a 
probability value established on a basis of information 
indicative of a geographical location of the user of said 
automated directory assistance system. 

As embodied and broadly described herein the invention 
further provides a method for performing speech recognition 
in an automated directory assistance system, said method 
comprising the steps of: 

a) providing a speech recognition dictionary including a 
plurality of orthographies potentially recognizable on a 
basis of a spoken utterance by a user of said automated 
directory assistance system, each orthography being 
indicative of a locality in which an entity whose 
telephone number potentially sought by the user may 
reside; 

b) receiving a spoken utterance by the user of the auto- 
mated directory assistance system; 

c) searching said dictioanry to derive a list of 
orthographies, each orthography in said being a candi- 
date having a certain probability to be a match to the 
spoken utterance; 

d) obtaining an identifier indicative of a geographical 
location of a terminal at which the user has input the 
spoken utterance; 

e) utilizing said identifier to rank the candidates derived at 
step c in terms of likelihood of potential match with the 
spoken utterance. 

As embodied and broadly described herein the invention 
further provides a machine readable storage medium con- 
taining a program element for instructing a computer for 
selecting at least one orthography from a speech recognition 
dictionary as being a likely match to a given spoken 
utterance, said computer including: 

a) first memory means containing said speech recognition 
dictionary; 

b) a processor in operative relationship with said first 
memory means; 

c) said program element providing means for: 

i) directing said processor to select from said speech 
recognition dictioanary a plurality of orthographies, 
said plurality of orthographies, said plurality of 
orthographies forming a list of candidates, each 
candidate having a certain probability to correspond 
to the spoken utterance; 

ii) directing said processor to weigh candidates in said 
list on a basis of an input indicative of a geographical 



06/04/2004,, EAST Version: 1.4.1 



6,122,361 

5 6 

location of the user of said automated directory providing a plurality of orthographies potentially recog- 

assistance system. nizable on a basis of a spoken utterance by a user of 

In a somewhat different aspect of this invention the scored said automated directory assistance system, each 

orthographies obtained as a result of the first pass search can orthography being indicative of a locality in which an 

be weighed on the basis of information related to the call 5 entity whose telephone number potentially sought by 

destination rather than the call origin, as described earlier in ^ user mav reside ; 

connection with a specific example. Typically, automated « . * , ' , 

directory assistance systems can be acceded by dialing one detectm £ a s P oken utterance b ? a 

of a series of possible telephone numbers assigned by the detecting at least a portion of a telephone number dialed 

telephone companies to this function. Each telephone num- by the user to access a directory assistance call func- 

ber is assigned a specific region in a large geographical area tion; 

such as a large city, province or a country. Thus, a user selecting at least in part on the basis of the spoken 

desirous to obtain the telephone number in a locality close utterance and at least in part on a basis of said at least 

to his residence dials one specific telephone number. A a portion of a telephone number dialed by the user at 

typical telephone number that is used for this -function in the least one of ^ orthographies as being a probable 

North American comment is 411. On the other hand, if the ™ match tQ the keQ utterance . 
user desires to obtain directory assistance for a locality 

situated far from his residence, a different telephone number BRIEF DESCRIPTION OF THE DRAWINGS 
is used. For example, in the province of Quebec, the 

telephone number 555-1212 can be used, preceded by the FIG. 1 shows a block diagram of a general speech 

appropriate NPA. For the province of Quebec, three NPA's 20 recognition system; 

^if 6 ^?^ t 18 8 u 9 ' ,u , FIG. 2 shows a prior art speech recognition system; 

The NPA of the telephone number that the user is dialing _ . _ _ _ . .„ 7, 

when he is desirous of accessing automated directory assis- FIGS * 3 > 4 & 5 are flowcharts illustrating the operation of 

tance functions can provide some general indication of the a s P eech recognition apparatus utilizing the histograms 

geographical relationship or distance between the site at 25 generated with the method and apparatus in accordance with 

which the user is formulating the request and the locality that tne invention, 

he seeks. Take as an example a situation where the user dials _ „ „ _ . 

514-555-1212. One can then assume that since this number DESCRIPTION OF PREFERRED 

has been dialed, a locality in the geographical area within the EMBODIMENTS 

boundary in which the 514 NPA is effective is being sought. 30 Speech recognition systems have been developed in many 

Thus, localities within that boundary can be given a higher par ts of the world and, although it is difficult to describe a 

probability, while localities outside of that boundary can be standard recognition system architecture, some characteris- 

selectively penalized. tics are shared between many of them. A typical speech 

This approach allows enhancing the traditional acoustical recognition system, of the type depicted in FIG. 1, generally 

match recognition procedure used to effect speech recogni- 35 comprises a device such as a microphone or telephone set 

tion. Objectively, the information relating to the NPA dialed 101 to convert the spoken utterance into an electric signal 

by the user is less determinative than the information and transmit the signal to the speech recognition unit 100. 

relating to the source of the call (NPA-NXX). Thus, it is The speech recognition unit 100 can be split into two 

preferable to utilize the calling number data when weighing functional blocks namely a pre-processing block 102 and a 

the orthographies in the speech recognition dictionary. In 40 search unit 104. The pre-processing unit 102, also called the 

some situations, however, the calling number data may not acoustic processor, performs the segmentation, the normali- 

be available or statistical information for the particular sation and the parameterisation of the input signal wave- 

NPA-NXX combination may not be provided in the system. form. In some cases, especially for connected word speech, 

For those instances, the probability data derived from the this stage may also include a feature extraction operation, 

called number can be utilized. 45 The search block 104 includes a speech recognition dictio- 

As embodied and broadly described herein, the invention nary that is scored in order to find possible matches to the 

provides an automated directory assistance system compris- spoken utterance. The search may be done in several steps 

m S : in order to maximise the probability of obtaining the correct 

a) a speech recognition dictionary including a plurality of result in the shortest possible time and most preferably in 
orthographies potentially recognizable on a basis of a 50 real-time. 

spoken utterance by a user of said automated directory More specifically, the purpose of the pre-processing block 

assistance system, each orthography being indicative of 102, illustrated in greater detail in FIG. 2, is first to translate 

a locality in which an entity whose telephone number the incoming analog speech waveform into digital format, 

potentially sought by the user may reside; ibis caQ be done with t h e use of a simple A/D converter, a 

b) means for detecting at least a portion of a telephone 55 spectrogram generator or any other suitable technique. The 
number dialed by the user to access a directory assis- input signal is then split into short segments called analysis 
tance call function; frames whose typical duration ranges from 5-20 ms. All 

c) means responsive to said at least a portion of said further processing will be done relative to these frames. In 
telephone number dialed by the user and to the spoken general, the pre-processing block comprises a normalisation 
utterance for determining a probability value for at least 60 sub-block 200, a parameterisation sub-block 202 and an 
one of said orthographies, the probability value being endpoint detection sub-block 206. The normalisation sub- 
indicative of a likelihood of match between said at least block 200 adjusts the maximum signal amplitude of each 
one of said orthographies and the spoken utterance. analysis frame to a standard level in order to take into 

As embodied and broadly described herein, the invention account variations in speech intensity, transmission losses 
further provides a method for at least partially automating 65 and other physical effects such as distance from the micro- 
directory assistance in a telephone system, said method phone and recording level. The parameterisation sub-block 
comprising the steps of: 202 typically represents speech frames in terms of voicing 
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decision, amplitude and fundamental frequency. A wide the one having the highest score is output as the best choice, 

variety of parameters can be used in the parameterisation As a reference, the reader is invited to consult U.S. Pat. No. 

block the most common being LPC coefficients, Mel-based 5,097,509 by inventor Lennig M. entitled "A Rejection 

cepstral coefficients, energies in a channel vocoder and zero Method for speech recognition". 

crossing rate in a band-pass channel. The final sub-block of 5 The speech recognition dictionary 218 used in the above 

the pre-processing modeule, the endpoint detection or seg- described procedure can be organised in numerous ways, 

mentation sub-block 206 splits the input signal waveform The dictionary may be stored in the form of a graph where 

into start and end of the speech utterance. This stage uses the links between nodes are words with their associated 

algorithms whose purpose is to locate the boundaries probabilities. The organisation of the dictionary can have a 

between silence and speech. In continuous and connected significant impact on the performance of the speech recog- 

speech recognition, the endpoint detection stage is only used nition system. 

to provide a crude estimate of speech boundaries. In the A simple example of the operation of a prior art speech 

1980's, most systems used the short term energy and the recognition system will make its functioning clearer. Let us 

zero crossing rate as indication of the beginning or end of a assume that the speech recognition dictionary consists of the 

word. Currently, endpoint detection units use many param- J5 following list of orthographies where each entry is indicative 

eters including frame energy, frame voice labels and other of a locality potentially requested by the user: 
statistical variance parameters derived from speech. 

The search functional block 104, shown in more detail in 

FIG. 2, ranks all the orthographies in a dictionary such as to nicti ~" ~~~ —— — 

be able to derive the orthography or orthographies which 20 Montreal 

have the highest probability of matching the spoken utter- Laval 

ance. This block comprises three functional layers of speech Lasalle 

processing and a dictionary. The purpose of performing the Quebec 

search in three separate stages is to improve the performance Ottawa 

in terms of computation and speed. The first rough calcu- 25 — ^ _ 

lation stage 208, also called first pass search stage, allows , , . 

the system to eliminate those orthographies that are most ^ ume ' ha l* e mput s ^ ech was " Lava1 "- 111(5 first sta § e of 

unlikely to constitute a match to the spoken utterance. For ' h f, search 208 after reordering the results, might yield the 

these orthographies, the exact score assigned by a more followng candidate hst: 

precise calculation (e.g. Viterbi) would serve no useful 30 





Probability of match to the spoken 


Locality 


utterance (rough estimate) 


Lasalle 


0.8 


Laval 


0.75 


Montreal 


0.6 


Quebec 


0.55 



purpose. However, the time saved by performing a simplier 
calculation improves the performance in speed of the system 
by several orders of magnitude. 

More specifically, the first pass search stage 208, performs 
some rough probabilistic calculations and extracts from the 35 
speech recognition dictionary 218 a list of possible candi- 
dates for the spoken utterance. Typical algorithms that can 
be used at this stage include the fast score estimation and the 
graph search algorithms. As a reference, the reader is invited 
to consult Gupta V. N., Lennig M., Mermelstein P. "A fast 40 As shown in the above table, the first pass search stage 
search strategy in a large vocabulary word recogniser INRS- scores all the orthographies in the dictionary then selects the 
Telecommunications. J. Acoust. Soc. Am. 84 (6), December top N scores in the graph. In this example consider only the 
1988, p. 2007 and U.S. Pat. No. 5,515,475 by inventors top three scores. 

Gupta V. N. & Lennig M. The content of these documents This list is then passed to the re-score stage 210 that 
is incorporated herein by reference. 45 calculates more precise likelihoods for each candidate. Note 

The second layer, often called the re-score stage 210, that the re-scoring is performed only for the candidates in the 
performs more precise calculations but only on the top N list. In a real world situation the list is much longer, typically 
candidates in the list supplied by the first pass search. At this containing between 6 and 30 entries. The re-scoring will be 
stage, techniques such as the Viterbi algorithm with com- effected only on the top N candidates, N ranging typically 
plete allophone models and model distances will be used. 50 from 6 to 30. The results of the re-scoring stage could be the 
Although these require heavy complex computations, the following: 
number of candidates for which the computation must be 
performed has been greatly reduced. The result of the 
re-score stage is a short list of orthographies with their 
associated exact scores (probabilities of being a match to the 55 
spoken utterance). The two highest-ranking orthographies in 
the list are then typically transferred to the rejection stage 
212. 

The rejection stage 212, compares the two top orthogra- 
phies obtained by the re-score stage 210 and, according to a 60 
chosen threshold, will determine if a possible correct map- The two top scores in the previous table are sent to the 
ping was found or if there is confusion between the two top rejection layer 212 which computes the likelihood of the top 
orthographies. If the difference between the two top orthog- choice being correct using the P3 rejection algorithm (for 
raphies is such that it is less than the threshold, the system more details on this algorithm see U.S. Pat. No. 5,097,509). 
may abort the operation on the basis that a resolution 65 In simple terms, this algorithm computes the following: 
between the two orthographies cannot be made. On the other IF (rejection value >Threshold Value) 
hand if the difference between the orthographies is sufficient, Submit top score as answer 





Probability of match with the spoken 


Locality 


utterance (exact calculation) 


Lasalle 


0.85 


Laval 


0.78 


Montreal 


0.6 
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ELSE 

Invoke default procedure, such as passing the matter to a 
human operator. 

Speech recognition units of the type described earlier 
have been used in the past for locality recognition in 
automated directory assistance systems. Such systems are 
usually integrated into a telephone network allowing users to 
formulate requests from terminal subscriber equipment such 
as fixed or mobile telephone sets. In the normal course of 
providing the directory assistance function, those systems 
prompt the user to indicate in what locality resides the entity 
whose telephone number is being sought. Accurate locality 
recognition is a crucial step in the success of the operation 
since each locality is associated with an individual speech 
recognition dictionary that contains the names of the entities 
that can be recognized by the system. Thus, should the 
wrong locality be output as a top choice by the speech 
recognition layer, the remaining processing of the automated 
directory assistance function is most likely to fail since the 
wrong choice in the locality implies that the wrong speech 
recognition dictionary of entity names will be invoked 
during the post locality processing. 

The present inventor has made be unexpected discovery 
that the accuracy of the speech recognition system as it 
relates to locality recognition can be significantly improved 
by utilizing in the process of recognition data indicative of 
the process of location of the user and on the called directory 
assistance number. 

This enables to augment the recognition accuracy by 
taking into account statistical data derived from calling 
patterns. In a specific example the calling patterns may 
indicate that a user residing in a certain locality is more 
likely to request a telephone number of an entity residing in 
locality A, rather than in locality B. If localities A and B have 
a similar acoustical structure, the a priori advisor based on 
geographical location can be used to bias one locality more 
than the other and thus provide a resolution. 

In a most preferred embodiment, the probability that a 
certain locality is the one matching the spoken utterance is 
conditioned on two separate elements namely the calling 
NPA-NXX and the called number. This a priori estimate is 
usually expressed as P (called locality| calling NPA-NXX, 
called number). In the province of Quebec, the called 
number can be either 411 or 555-1212 or NPA-555-1212. 
Therefore the a priori estimates are reduced to three possible 
elements: 

A) P (called locality [calling NPA-NXX, 411 or 555- 
1212), 

B) P (called locality | calling NPA-NXX, NPA-555- 
1212), and 

Q P (called locality | called NPA). 

To estimate the probabilities resulting from possibilities 
A, B and C above, the speech recognition dictionary has a 
number of histograms that establish a relationship between 
localities and probability data. Preferably, a histogram is 
estimated for each relevant NPA-NXX combination or 
called NPA. The following example will illustrate this. 

The speech recognition system in this example comprises 
a set of histograms that can be addressed on the basis of a 
histogram identifier stored in a lookup table shown in table 
1 below. These indices depend on both the called number 
and the calling NPA-NXX and NPAs. The left column 
corresponds to called number and the right column corre- 
sponds to the calling NPA-NXX or NPA followed by the 
histogram index. For example if the called number were 411 
and the calling number 514-421-7563 then histogram 5 
would be selected. 



10 

TABLE 1 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



Sample of a priori table with indices to the histograms 


Called Number 


Calling NPA-NXX: histogram index 


411 or 555-1212 


514-620:4; 514-421:5; 819-829:6; 




418-621:7; 418:1; 514:2; 819:3; 


418-554-1212 


418-621:7; 418:1; 


514-555-1212 


514-620:4; 514-421:5; 514:2; 


819-555-1212 


819-829:6; 819:3; 



In the preferred embodiment of this invention, two sepa- 
rate histogram sets can be referred to depending upon the 
particular case involved. The first histogram set includes a 
plurality of individual histograms, each histogram being 
associated with a given NPA. In a specific example that 
could apply to the province of Quebec, three histograms 
would be provided, for the respective NPAs 514, 418 and 
819. The histograms are illustrated below and have N 
records each and are therefore associated with a speech 
recognition dictionary containing a orthographies: 



Locality index 



Probability value 





NPA -514 




Locality (0) 




0.055 


Locality (1) 




0.100 


Locality (2) 




0.050 


Locality (3) 




0.003 


Locality (N) 




0.040 




NPA - 418 




Locality (0) 




0.040 


Locality (1) 




0.008 


Locality (2) 




0.200 


Locality (3) 




0.001 


Locality (N) 




0.034 




NPA - 819 




Locality (0) 




0.045 


Locality (1) 




0.060 


Locality (2) 




0.005 


Locality (3) 




0.009 


Locality (N) 




0.013 



The second histogram set includes an individual histo- 
gram for each NPA-NXX combination available. The fol- 
lowing is an example of two histograms: 



Locality index 



Probability value 



NPA-NXX - 514-620 



Locality (0) 
Locality (1) 
Locality (2) 
Locality (3) 

Locality (N) 



0.028 
0.067 
0.012 
0.102 

0.083 



NPA-NXX - 819-820 



Locality (0) 
Locality (1) 
Locality (2) 
Locality (3) 

Locality (N) 



0.045 
0.003 
0.071 
0.001 

0.043 



65 



During the operation of the speech recognition system the 
information contained in these histograms is consulted and 
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contributes to determine the orthography that the system will 
output as top choice. The flowchart in FIG. 3 illustrates the 
orthography selection process in greater detail. 

When the spoken utterance is received at step 400, the 
signal is scored based on acoustic criteria, as shown at step 5 
402. Algorithms, such as the fast match algorithm may be 
used to perform this scoring. The fast match algorithm 
scores all the orthographies in the speech recognition dic- 
tionary. For the purpose of illustration, an ordered list of 
scored orthographies is shown in the table depicted at FIG. 
3. At this stage it is not necessary to order the list. For more 
information on the fast match algorithm, the user may wish 
to consult Gupta V. N., Lennig M, Mermelstein P. "A fast 
seach strategy in a large vocabulary word recogniser" INRS- 
Telecommunications. J. Acoust. Soc. Am. 84 (6), December 
1988, p. 2007 and U.S. Pat. No. 5,515,475 by inventors 15 
Gupta V. N. & Lennig M. The content of these documents 
is incorporated herein by reference. At step 404 of the 
process, the number that the user dialed to obtain directory 
assistance is analyzed. If this number is a local number (such 
as 411 or 555-1212 for the province of Quebec), in other 20 
words no NPAhas been dialed or the NPA corresponds to the 
local region, the process at the conditional step 406 is 
answered in the affirmative. If any other number has been 
dialed, the conditional step branches to processing block A, 
that will be discussed later in greater detailed in connection 2 s 
with FIG. 4. 

Conditional step 408 determines if the NPA-NXX for the 
telephone number of the user who has requested the direc- 
tory assistance function is available and if an a priori 
histogram exists for that NPA-NXX combination. If the 
NPA-NXX combination is not available or has not been 
programmed into the system, in other words no histogram 
that provides probability values based on calling patterns for 
this NPA-NXX combination exists, the conditional step 408 
is answered in the negative and processing continues at 
block B that will be described in greater detail later in 35 
connection with FIG. 5. However, if the NPA-NXX com- 
bination is available and a histogram exists for that combi- 
nation in the system, the process continues at step 410 where 
the histogram is retrieved from memory and prepared for 
further processing. At step 412, the histogram is searched for 40 
each locality present in the dictionary. If the search is 
successful, the a priori probabilities associated with the 
locality name are obtained from the histogram and stored in 
memory. At step 414, the system computes a compound 
probability based on the probability value of acoustical 45 
match and the probability value extracted from the a priori 
advisor. In the present embodiment the following equation is 
used to compute the compound probabilities also referred to 
as log likelihoods in the literature: 

Log (compound probability)** 50 

log(acoustic probability)+0.007x 

[number of speech frames * log(a priori probability)] 

Once the compound probability value is obtained, the list 
of candidates is re-ordered 416. The ordered list is shown in 
FIG. 3. By comparison to the original list, a number of 55 
entries have changed position. Now, Locality(5), Locality(8) 
and Locality(l) occupy the first three slots, while previously 
those positions were taken by localities Locatity(0), Locality 
(5) and Locality(8). 

'At step 418, the top three candidates in the re-ordered list 60 
are taken and passed to the re-scoring stage 420 where a full 
acoustic match analysis with the spoken utterance is per- 
formed so an orthography can be chosen as the best possible 
match with the spoken utterance. As a variant, the re-score 
stage may also use the a priori probabilities stored in the 65 
histograms as shown by the dotted line between step 412 and 
420 in FIG. 3. 



Returning back to conditional step 406, should this step be 
answered in the negative, which indicates that a determina- 
tion was made that the user has dialed the telephone number 
of the automated directory assistance system preceded by an 
NPA (area code) combination 500, the process branches to 
flow chart block A illustrated at FIG. 4. At step 502, the 
system uses the default histogram corresponding to that 
NPA, the NPA being the one of the called number, not the 
one of the user's telephone number. 

Conditional step 408 is answered in the negative when the 
calling NPA-NXX combination is not available or not pro- 
grammed in the system. In this case, the system defaults to 
a procedure that establishes the compound probability value 
based on histograms associated with the called NPA rather 
than the calling NPA-NXX combination. The flow chart 
block B illustrated in FIG. 5 describes the procedures in 
detail. More specifically, at functional block 600 the NPA of 
the called number is obtained. The histogram corresponding 
to this NPA is obtained as shown in block 602. The pro- 
cessing is then resumed at step 412 in FIG. 3 where the 
compound probability values are computed on the basis of 
the probability value extracted from the histogram and the 
probability value corresponding to the acoustical match is 
used to re-order the candidate list. 

In the example of the operation of the speech recognition 
system depicted at FIGS. 3, 4 and 5, the probability derived 
from the histograms is applied to all the orthographies in the 
speech recognition dictionary before selecting a set of 
candidates. The operation of the system can be simplified by 
performing these computations separately. For example, the 
a priori probabilities are extracted exclusively from the top 
N orthographies. This can be effected by identifying the 
applicable histogram and searching for the relevant localities 
in the histogram computing the log likelihoods only for the 
candidates in the list. 

The above description of a preferred embodiment should 
not be interpreted in any limiting manner since variations 
and refinements can be made without departing from the 
spirit of the invention. For instance, although an example of 
the invention has been provided above with strong emphasis 
on an automated directory assistance system, the apparatus 
using an a priori advisor for the speech recognition dictio- 
nary could also be used in other types of speech recognition 
systems. The scope of the invention is defined in the 
appended claims and their equivalents. 

I claim: 

1. An automated directory assistance system comprising: 

a) a speech recognition dictionary including a plurality of 
vocabulary items potentially recognizable on a basis of 
a spoken utterance by a user of said automated direc- 
tory assistance system, each vocabulary item being 
indicative of a locality in which an entity whose 
telephone number potentially sought by the user may 
reside; 

b) extraction unit for extracting from said speech recog- 
nition dictionary on the basis of the spoken utterance by 
the user a plurality of vocabulary items, each of said 
plurality of vocabulary items being a candidate having 
a certain probability to correspond to the spoken 
utterance, said plurality of vocabulary items forming a 
list of candidates; 

c) a plurality of a priori data structures, each a priori data 
structure containing a plurality of probability data 
elements, the probability data elements being derived at 
least in part on a basis of call records indicative of prior 
automated directory assistance transactions; 

d) a selecting unit for selecting one of said a priori data 
structures; and 
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e) a weighing unit for weighing candidates in said list on 
a basis of probability data elements in said one of said 
a priori data structures. 

2. An automated directory assistance system as defined in 
claim 1, wherein said weighing unit for weighing the can- 5 
didates establishes for each candidate a compound probabil- 
ity value that is indicative of a likelihood of match of the 
candidate with the spoken utterance, the compound prob- 
ability depending in part on a degree of acoustic match 
between the candidate and the spoken utterance and depend- 10 
ing in part on a probability data elements in said one of said 

a priori data structures. 

3. An automated directory assistance system as defined in 
claim 2, wherein said selecting unit is operative for process- 
ing data indicative of at least a portion of a telephone 15 
number of a terminal from which the user is inputting the 
spoken utterance to select one of said a priori data structures. 

4. An automated directory assistance system as defined in 
claim 3, wherein said selecting unit is operative for process- 
ing an NPA-NXX of a telephone number of a terminal from 20 
which the user is inputting the spoken utterance to select one 

of said a priori data structures. 

5. An automated directory assistance system as defined in 
claim 4, wherein said selecting unit comprises: 

a) an input for receiving an NPA-NXX combination of a 25 
telephone number of a terminal from which the user is 
inputting the spoken utterance, 

b) an identification unit for identifying an a priori data 
structure associated with said NPA-NXX combination 

in said plurality of a priori data structures, said weigh- 30 
ing unit including: 

a searching unit for searching the a priori data structure 
identified at paragraph b to extract therefrom a 
probability data element corresponding to at least 
one candidate in said list. 35 

6. An automated directory assistance system as defined in 
claim 4, wherein each a priori data structure includes a 
plurality of indices, each index being associated with a 
corresponding probability data element. 

7. An automated directory assistance system as defined in 40 
claim 3, wherein said weighing unit is operative for retriev- 
ing a probability data element from said one of said a priori 
data structures for each candidate in said list. 

8. An automated directory assistance system as defined in 
claim 7, wherein said directory assistance system includes a 45 
plurality of data structures, each data structure being asso- 
ciated with an identifier of geographical location from which 

a user may input the spoken utterance. 

9. An automated directory assistance system as defined in 
claim 8, wherein said identifier is at least a portion of a 50 
telephone number. 

10. An automated directory assistance system as defined 
in claim 8, comprising: 

a) an input for receiving data indicative of at least a 55 
portion of a telephone number of a terminal at which 
the user is inputting the spoken utterance, 

b) an identification unit for identifying a data structure 
associated with the data indicative of at least a portion 

of a telephone number of a terminal at which the user 60 
is inputting the spoken utterance, 

c) a search unit for searching the data structure identified 
at paragraph b to extract therefrom probability data 
corresponding to at least one candidate. 

11. An automated directory assistance system as defined 65 
in claim 10, wherein said search unit for searching the data 
structure has the ability to search the data structure for each 
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of said plurality of vocabulary items and output probability 
data associated with each candidate. 

12. An automated directory assistance system as defined 
in claim 10, comprising a selecting unit for selecting N 
candidates from the list, where N is less than the total 
number of candidates in the list. 

13. An automated directory assistance system as defined 
in claim 12, comprising a re-scoring unit for re-scoring the 
N candidates selected in the list to determine for each of the 
N candidates a likelihood of match with the utterance on a 
basis of the degree of acoustic match between the utterance 
and the candidate. 

14. An automated directory assistance system as defined 
in claim 12, wherein N is in the range of 6 to 24. 

15. An automated directory assistance system as defined 
in claim 2, comprising a ranking unit for ranking the 
candidates in accordance with said command probability 
value. 

16. A method for performing speech recognition in an 
automated directory assistance system, said method com- 
prising the steps of: 

a) providing a speech recognition dictionary including a 
plurality of vocabulary items potentially recognizable 
on a basis of a spoken utterance by a user of said 
automated directory assistance system, each vocabu- 
lary item being indicative of a locality in which an 
entity whose telephone number potentially sought by 
the user may reside; 

b) receiving a spoken utterance by the user of the auto- 
mated directory assistance system; 

c) searching said dictionary to derive a list of vocabulary 
items, each vocabulary item in said dictionary being a 
candidate having a certain probability to be a match to 
the spoken utterance; 

d) selecting a certain a priori data structure from a 
plurality of a priori data structures on a basis of a 
geographical location associated with the user, the 
certain a priori data structure containing a plurality of 
a priori probability values, the plurality of probability 
values being derived at least in part on a basis of call 
records indicative of prior automated directory assis- 
tance transactions; and 

e) assigning to at least one vocabulary item in said list of 
vocabulary items an a priori probability value selected 
from said certain a priori data structure. 

17. A method for performing speech recognition in an 
automated directory assistance system, said method com- 
prising the steps of: 

a) providing a speech recognition dictionary including a 
plurality of vocabulary items potentially recognizable 
on a basis of a spoken utterance by a user of said 
automated directory assistance system, each vocabu- 
lary item being indicative of a locality in which an 
entity whose telephone number potentially sought by 
the user may reside; 

b) receiving an utterance spoken by the user of the 
automated directory assistance system; 

c) searching said dictionary to derive a list of vocabulary 
items, each vocabulary item in said list being a candi- 
date having a certain probability to be a match to the 
spoken utterance; 

d) obtaining a certain identifier indicative of a geographi- 
cal location of a terminal at which the user has input the 
spoken utterance; 

e) selecting a certain a priori data structure from a 
plurality of a priori data structures on a basis of the 
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certain identifier, the certain a priori data structure 
containing a plurality of a priori probability values, the 
plurality of probability values being derived at least in 
part on a basis of call records indicative of prior 
automated directory assistance transactions; and 5 
f) utilizing a priori probability values in the certain a priori 
data structure to rank the candidates derived at step c in 
terms of likelihood of potential match with the spoken 
utterance. 

18. A method for performing speech recognition as 10 
defined in claim 17, comprising the steps of: 

a) for each candidate in the list searching the certain a 
priori data structure to extract a corresponding prob- 
ability data element; and 

b) utilizing the probability data elements obtained at step 
a) to rank the candidates in said list in terms of 
likelihood of potential match with the spoken utterance. 

19. A method for performing speech recognition as 
defined in claim 18, comprising the steps of: 2Q 

a) providing a plurality of a priori data structures, each 
data structure establishing a correspondence between a 
plurality of vocabulary items in said speech recognition 
dictionary and corresponding probability data 
elements, each a priori data structure being assigned an 2 5 
identifier representative of a geographical location at 
which is located a terminal at which the user inputs the 
utterance, 

b) determining a value of said certain identifier; and 

c) searching the data structure corresponding to said 30 
certain identifier to extract probability data associated 

to several ones of the candidates. 

20. A method for performing speech recognition as 
defined in claim 19, comprising the step of determining at 
least a portion of a telephone number of the terminal at 35 
which the user inputs the utterance to determine the value of 
the certain identifier. 

21. An automated directory assistance system comprising: 

a) a speech recognition dictionary including a plurality of 
vocabulary items potentially recognizable on a basis of 
a spoken utterance by a user of said automated direc- 
tory assistance system, each vocabulary item being 
indicative of a geographical area, 

b) a first search unit for extracting from said speech 45 
recognition dictionary on the basis of the spoken utter- 
ance by the user a list of vocabulary items, each 
vocabulary item in said list being a candidate having a 
certain probability to constitute a match to the spoken 
utterance, 

c) a processing unit for deriving data indicative of a 
geographical location at which the user has formulated 
the utterance, 

d) a selecting unit for selecting from a plurality of a priori 
data structures a priori probability data elements on a 55 
basis of said geographical location at which the user 
has formulated the utterance, the probability data ele- 
ments being derived at least in part on a basis of call 
records indicative of prior automated directory assis- 
tance transactions; 60 

e) weighing unit for weighing candidates in said list of 
vocabulary items on a basis of said a priori data 
elements. 

22. A machine readable storage medium containing a 
speech recognition dictionary for use in an automated direc- 65 
tory assistance system, said speech recognition dictionary 
including: 



40 
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a) a plurality of vocabulary items potentially recognizable 
on a basis of a spoken utterance by a user of the 
automated directory assistance system, each vocabu- 
lary item being indicative of a locality in which an 
entity whose telephone number potentially sought by 
the user may reside; 

b) a plurality of identifier elements, each identifier ele- 
ment being indicative of a geographical area at which 
a user of the automated directory assistance system 
may formulate the spoken utterance; 

c) a plurality of data structures associated with respective 
identifier elements, each data structure establishing a 
correspondence between several ones of said vocabu- 
lary items and probability data, said probability data 
allowing the automated directory assistance system to 
determine if a vocabulary item is a likely match to the 
spoken utterance. 

23. A machine readable medium containing a program 
element for instructing a computer for selecting at least one 
vocabulary item from a speech recognition dictionary as 
being a likely match to a given spoken utterance, said 
computer including: 

a) first memory unit containing the speech recognition 
dictionary; 

b) a processor in operative relationship with the first 
memory unit; 

c) said program element being operative for: 

i) directing the processor to select from the speech 
recognition dictionary a plurality of vocabulary 
items, the plurality of vocabulary items forming a list 
of candidates, each candidate having a certain prob- 
ability to correspond to the spoken utterance; 

ii) directing the processor to select from a plurality of 
a priori data structures a priori probability data 
elements related to an identifier indicative of a 
geographical location associated to the terminal at 
which the user has formulated the utterance, the 
probability data elements being derived at least in 
part on a basis of call records indiative of prior 
automated directory assistance transactions, 

iii) directing the processor to weigh candidates in the 
list of candidates on a basis of the a priori data 
elements. 

24. An automated directory assistance system comprising: 

a) a speech recognition dictionary including a plurality of 
vocabulary items potentially recognizable on a basis of 
a spoken utterance by a user of said automated direc- 
tory assistance system, each vocabulary item being 
indicative of a locality in which an entity whose 
telephone number potentially sought by the user may 
reside; 

b) first processing unit for detecting at least a portion of 
a telephone number dialed by the user to access a 
directory assistance call function; 

c) second processing unit responsive to said at least a 
portion of said telephone number dialed by the user and 
to the spoken utterance for determining a probability 
value for at least one of said vocabulary items, the 
probability value being indicative of a likelihood of 
match between said at least one of said vocabulary 
items and the spoken utterance. 

25. An automated directory assistance system as defined 
in claim 24, wherein said at least a portion of a telephone 
number dialed by the user is an NPA portion of a telephone 
number permitting access to the directory assistance func- 
tion. 
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26. An automated directory assistance system as defined 
in claim 25, comprising a computing unit for computing said 
probability value for a plurality of vocabulary items. 

27. An automated directory assistance system as defined 
in claim 26, comprising a ranking unit for ranking vocabu- 
lary items on a basis of the computed probability values. 

28. A method for at least partially automating directory 
assistance in a telephone system, said method comprising 
the steps of: 

providing a plurality of vocabulary items potentially 
recognizable on a basis of a spoken utterance by a user 
of said automated directory assistance system, each 
vocabulary item being indicative of a locality in which 
an entity whose telephone number potentially sought 
by the user may reside; 

detecting a spoken utterance by a user; 

detecting at least a portion of a telephone number dialed 
by the user to access a directory assistance call func- 
tion; 
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selecting at least in part on the basis of the spoken 
utterance and at least in part on a basis of said at least 
a portion of a telephone number dialed by the user at 
least one of said vocabulary items as being a probable 
match to the spoken utterance. 

29. A method as defined in claim 28, wherein said at least 
a portion of a telephone number dialed by the user is an NPA 
portion of a telephone number permitting access to the 
directory assistance function. 

30. A method as defined in claim 29, comprising the step 
of determining a probability value for a plurality of vocabu- 
lary items. 

31. A method as defined in claim 30, comprising the step 
of ranking vocabulary items on a basis of the computed 
probability values. 
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